Cobain GIC

Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

scripts/run_conga.py --all --gex_data /scratch.global/ben_testing/ben_tcr/Cobain_PBMC/outs/filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file Cobain_PMBC_TCR --organism human --outfile_prefix Cobain_PMBC_Final

Stats

num_cells_w_gex: 11385
num_features_start: 26530
num_cells_w_tcr: 392
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 14
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 36
num_TR_genes_in_hvg_set: 35
num_highly_variable_genes: 2091
num_cells_after_filtering: 378
num_clonotypes: 334
max_clonotype_size: 10
num_singleton_clonotypes: 314
nbr_frac_for_nndists: 0.1
num_gvg_hit_clonotypes: 4
num_gvg_hit_biclusters: 0

graph_vs_graph_stats


Here we are assessing overall graph-vs-graph correlation by looking at the shared edges between TCR and GEX neighbor graphs and comparing that observed number to the number we would expect if the graphs were completely uncorrelated. Our null model for uncorrelated graphs is to take the vertices of one graph and randomly renumber them (permute their labels). We compare the observed overlap to that expected under this null model by computing a Z-score, either by permuting one of the graph's vertices many times to get a mean and standard deviation of the overlap distribution, or, for large graphs where this is time consuming, by using a regression model for the standard deviation. The different rows of this table correspond to the different graph-graph comparisons that we make in the conga graph-vs-graph analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction of the total number of clonotypes) to each other and to GEX and TCR "cluster" graphs in which each clonotype is connected to all the other clonotypes with the same (GEX or TCR) cluster assignment. For two K values (the default), this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the two K values (aka nbr_fracs).

The column to look at is *overlap_zscore*. Higher values indicate more significant GEX/TCR covariation, with "interesting" levels starting around zscores of 3-5.

Columns in more detail:

graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster

nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes

overlap: the observed overlap (number of shared edges) between GEX and TCR graphs

expected_overlap: the expected overlap under a shuffled null model.

overlap_zscore: a Z-score for the observed overlap computed by subtracting the expected overlap and dividing by the standard deviation estimated from shuffling.
overlap expected_overlap overlap_mean overlap_sdev overlap_zscore overlap_zscore_fitted overlap_zscore_source nodes calculation_time calculation_time_fitted gex_edges tcr_edges gex_indegree_variance gex_indegree_skewness gex_indegree_kurtosis tcr_indegree_variance tcr_indegree_skewness tcr_indegree_kurtosis indegree_correlation_R indegree_correlation_P nbr_frac graph_overlap_type
12 9.027027 9.03 2.984812 0.995038 1.537755 shuffling 334 0.253149 0.000532 1002 1002 1.513514 1.787134 3.206418 0.483817 0.651398 0.196551 0.005069 0.926465 0.01 gex_nbr_vs_tcr_nbr
87 95.801802 94.96 9.957831 -0.799371 -1.380358 shuffling 334 0.110898 0.006273 1002 10634 1.513514 1.787134 3.206418 0.085341 0.483369 -0.648367 -0.084331 0.124006 0.01 gex_nbr_vs_tcr_cluster
224 212.162162 214.04 15.407738 0.646428 1.541907 shuffling 334 0.212288 0.014392 23550 1002 0.063494 -0.032034 -1.116034 0.483817 0.651398 0.196551 -0.035802 0.514364 0.01 gex_cluster_vs_tcr_nbr
1108 1092.270270 1093.43 39.772919 0.366330 0.386043 shuffling 334 0.138640 0.064915 11022 11022 1.016046 1.260377 1.081848 0.250537 1.124448 1.896752 -0.029170 0.595273 0.10 gex_nbr_vs_tcr_nbr
1081 1053.819820 1058.30 36.186876 0.627299 0.898448 shuffling 334 0.131173 0.062530 11022 10634 1.016046 1.260377 1.081848 0.085341 0.483369 -0.648367 -0.102237 0.061994 0.10 gex_nbr_vs_tcr_cluster
2425 2333.783784 2332.74 63.325448 1.456918 2.674585 shuffling 334 0.231508 0.143454 23550 11022 0.063494 -0.032034 -1.116034 0.250537 1.124448 1.896752 -0.028324 0.605987 0.10 gex_cluster_vs_tcr_nbr

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.054270 3.0 3.0 2 2 1.0 11 0.01 gex_nbr_vs_tcr_nbr NaN 1 0 TRAV1-2*01 TRAJ33*01 CAVVDSNYQLIW TRBV6-3*01 TRBJ2-4*01 CASSYSGQGELVWTQYF
0.358424 3.0 NaN 3 3 0.0 118 0.01 gex_nbr_vs_tcr_cluster 35.0 1 2 TRAV17*01 TRAJ30*01 CATARDDKIIF TRBV7-2*01 TRBJ2-1*01 CASSQTGLDEQFF
0.827085 NaN 33.0 17 17 0.0 25 0.10 gex_cluster_vs_tcr_nbr 94.0 0 4 TRAV12-1*01 TRAJ39*01 CAVRIGRAGNVLTF TRBV12-2*01 TRBJ1-1*01 CASRNRGGTEAFF
0.827085 NaN 33.0 17 17 0.0 263 0.10 gex_cluster_vs_tcr_nbr 94.0 0 1 TRAV6*01 TRAJ36*01 CALYTGVNNLFF TRBV20-1*01 TRBJ1-3*01 CSARGYRGASGNTVYF

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist

clump_type clone_index nbr_radius pvalue_adj num_nbrs expected_num_nbrs raw_count va ja cdr3a vb jb cdr3b clonotype_fdr_value clumping_group clusters_gex clusters_tcr
global 115 24 0.000007 2 0.000103 777.0 TRAV17*01 TRAJ27*01 CATDANADKLTF TRBV19*01 TRBJ2-4*01 CASGQGGQNTQYF 0.000006 1 1 2
global 116 24 0.000013 2 0.000141 1058.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CASGAGGQNTQYF 0.000006 1 1 2
global 117 24 0.000017 2 0.000161 1209.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CATGQGGQNTQYF 0.000006 1 1 2
global 115 48 0.000739 2 0.001052 7899.0 TRAV17*01 TRAJ27*01 CATDANADKLTF TRBV19*01 TRBJ2-4*01 CASGQGGQNTQYF 0.000006 1 1 2
global 116 48 0.000807 2 0.001100 8257.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CASGAGGQNTQYF 0.000006 1 1 2
global 117 48 0.000971 2 0.001206 9053.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CATGQGGQNTQYF 0.000006 1 1 2
global 115 72 0.039896 2 0.007748 58169.0 TRAV17*01 TRAJ27*01 CATDANADKLTF TRBV19*01 TRBJ2-4*01 CASGQGGQNTQYF 0.000006 1 1 2
global 116 72 0.040130 2 0.007771 58340.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CASGAGGQNTQYF 0.000006 1 1 2
global 117 72 0.048591 2 0.008553 64213.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CATGQGGQNTQYF 0.000006 1 1 2
global 116 96 0.906476 2 0.037298 280012.0 TRAV17*01 TRAJ27*01 CATDTNADKLTF TRBV19*01 TRBJ2-4*01 CASGAGGQNTQYF 0.000006 1 1 2

tcr_clumping_logos


This figure summarizes the results of a CoNGA analysis that produces scores (TCR clumping) and clusters. At the top are six 2D UMAP projections of clonotypes in the dataset based on GEX similarity (top left three panels) and TCR similarity (top right three panels), colored from left to right by GEX cluster assignment; TCR clumping score; joint GEX:TCR cluster assignment for clonotypes with significant TCR clumping scores, using a bicolored disk whose left half indicates GEX cluster and whose right half indicates TCR cluster; TCR cluster; TCR clumping; GEX:TCR cluster assignments for TCR clumping hits, as in the third panel.

Below are two rows of GEX landscape plots colored by (first row, left) expression of selected marker genes, (second row, left) Z-score normalized and GEX-neighborhood averaged expression of the same marker genes, and (both rows, right) TCR sequence features (see CoNGA manuscript Table S3 for TCR feature descriptions).

GEX and TCR sequence features of TCR clumping hits in clusters with 3 or more hits are summarized by a series of logo-style visualizations, from left to right: differentially expressed genes (DEGs); TCR sequence logos showing the V and J gene usage and CDR3 sequences for the TCR alpha and beta chains; biased TCR sequence scores, with red indicating elevated scores and blue indicating decreased scores relative to the rest of the dataset (see CoNGA manuscript Table S3 for score definitions); GEX 'logos' for each cluster consisting of a panel of marker genes shown with red disks colored by mean expression and sized according to the fraction of cells expressing the gene (gene names are given above).

DEG and TCRseq sequence logos are scaled by the adjusted P value of the associations, with full logo height requiring a top adjusted P value below 10-6. DEGs with fold-change less than 2 are shown in gray. Each cluster is indicated by a bicolored disk colored according to GEX cluster (left half) and TCR cluster (right half). The two numbers above each disk show the number of hits within the cluster (on the left) and the total number of cells in those clonotypes (on the right). The dendrogram at the left shows similarity relationships among the clusters based on connections in the GEX and TCR neighbor graphs.

The choice of which marker genes to use for the GEX umap panels and for the cluster GEX logos can be configured using run_conga.py command line flags or arguments to the conga.plotting.make_logo_plots function.
Image source: Cobain_PMBC_Final_tcr_clumping_logos.png

tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database



tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
1.708387e-02 3.054413e-13 4.421697 0 1 ENSMMUG00000043894 2.030560 0.269116 34 21 0.0 0.10 tcr_nbr gex
3.873624e-06 3.224697e-11 6.286960 4 1 ENSMMUG00000057062 1.562478 0.047160 4 294 0.0 0.01 tcr_nbr gex
1.658276e-02 1.098052e-10 3.793049 0 1 ENSMMUG00000043894 1.707525 0.281967 39 -1 0.0 0.00 tcr_cluster gex
3.548419e-01 5.727190e-10 4.120359 0 1 ENSMMUG00000043894 1.902601 0.283618 34 39 0.0 0.10 tcr_nbr gex
4.367775e-01 1.494862e-09 4.040171 0 1 ENSMMUG00000043894 1.868609 0.287470 34 55 0.0 0.10 tcr_nbr gex
8.451434e-01 3.303357e-09 3.895899 0 1 ENSMMUG00000043894 1.807568 0.294388 34 36 0.0 0.10 tcr_nbr gex
1.373750e+00 3.849813e-09 3.819334 0 1 ENSMMUG00000043894 1.775250 0.298051 34 22 0.0 0.10 tcr_nbr gex
2.411306e+00 4.787948e-09 3.770085 0 1 ENSMMUG00000043894 1.754496 0.300403 34 261 0.0 0.10 tcr_nbr gex
9.700939e-02 1.808073e-08 3.882836 1 6 ENSMMUG00000056515 1.852199 0.310625 34 188 0.0 0.10 tcr_nbr gex
3.839936e+00 5.580870e-08 3.777565 0 1 ENSMMUG00000043894 1.757647 0.300046 34 263 0.0 0.10 tcr_nbr gex
1.090397e+00 6.597051e-08 3.895928 0 1 ENSMMUG00000043894 1.807580 0.294387 34 106 0.0 0.10 tcr_nbr gex
8.061568e+00 1.481250e-07 3.608862 0 1 ENSMMUG00000043894 1.686772 0.308078 34 256 0.0 0.10 tcr_nbr gex
3.270032e+00 2.738993e-07 3.668767 0 1 ENSMMUG00000043894 1.711894 0.305231 34 41 0.0 0.10 tcr_nbr gex
4.312230e+00 2.971477e-07 3.630450 0 1 ENSMMUG00000043894 1.695819 0.307053 34 20 0.0 0.10 tcr_nbr gex
6.764464e+00 1.479286e-06 3.696471 0 1 ENSMMUG00000043894 1.723529 0.303913 34 308 0.0 0.10 tcr_nbr gex
7.435404e+00 2.566169e-06 3.633643 0 1 ENSMMUG00000043894 1.697158 0.306901 34 24 0.0 0.10 tcr_nbr gex
5.184455e+00 5.587554e-06 3.604352 4 6 ENSMMUG00000056515 1.732809 0.324156 34 173 0.0 0.10 tcr_nbr gex
1.784688e-01 4.332781e-03 3.913407 4 2 ENSMMUG00000061244 0.811675 0.079799 4 213 0.0 0.01 tcr_nbr gex
3.003855e-01 1.437850e-02 3.705284 4 6 PYROXD2 0.678610 0.071811 4 2 0.0 0.01 tcr_nbr gex
1.713394e-13 1.894107e+00 3.335814 1 8 PRNP 0.865793 0.127839 4 123 0.0 0.01 tcr_nbr gex

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: Cobain_PMBC_Final_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Cobain_PMBC_Final_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
5.124009e-03 1.606456e-58 8.960855 0 5 ENSMMUG00000056431 1.746110 0.009452 12 -1 0.0 TRAV35 tcr_genes gex
8.053680e-01 3.525253e-56 10.209968 1 3 ENSMMUG00000060662 2.224591 0.006941 9 -1 0.0 TRAV8-7 tcr_genes gex
2.181633e+00 6.223182e-55 9.943388 4 6 ENSMMUG00000056910 1.765435 0.004908 8 -1 0.0 TRAV16 tcr_genes gex
9.550868e+00 3.100889e-53 11.222490 3 2 ENSMMUG00000062897 2.480570 0.004571 7 -1 0.0 TRBV11-2 tcr_genes gex
1.250800e-02 4.111208e-47 10.347245 1 0 ENSMMUG00000063185 2.745764 0.011128 11 -1 0.0 TRBV4-2 tcr_genes gex
1.375858e-03 1.445244e-45 8.601505 2 7 ENSMMUG00000062085 2.231721 0.021183 14 -1 0.0 TRBV4-3 tcr_genes gex
2.590633e+00 6.687270e-45 9.048521 4 4 ENSMMUG00000062211 2.573418 0.022614 8 -1 0.0 TRBV12-2 tcr_genes gex
3.052696e+00 1.165275e-41 8.634254 1 1 ENSMMUG00000054409 2.001309 0.015975 8 -1 0.0 TRAV6 tcr_genes gex
4.918888e-04 1.571278e-38 8.312560 1 6 ENSMMUG00000065017 2.487195 0.034097 13 -1 0.0 TRAV12-1 tcr_genes gex
8.589368e-01 6.298497e-33 6.324867 1 3 ENSMMUG00000061081 1.297436 0.032642 15 -1 0.0 TRAV8-2 tcr_genes gex
2.704425e-03 1.403611e-32 6.618153 0 1 ENSMMUG00000061119 1.812065 0.050837 11 -1 0.0 TRAV18 tcr_genes gex
9.807201e+00 2.424732e-28 6.521937 0 1 ENSMMUG00000057062 1.173448 0.024010 12 -1 0.0 TRAV8-3 tcr_genes gex
1.165334e-15 8.614557e-26 5.753564 0 1 ENSMMUG00000043894 2.794281 0.250398 26 -1 0.0 TRBV20-1 tcr_genes gex
5.225770e+00 1.983631e-25 7.338211 4 5 ENSMMUG00000059325 1.905148 0.034741 6 -1 0.0 TRAV25 tcr_genes gex
4.799040e-15 4.187372e-16 5.013303 2 3 ENSMMUG00000043894 2.558404 0.314031 20 -1 0.0 TRBV19 tcr_genes gex
2.383471e-09 1.450544e-15 4.860598 4 6 ENSMMUG00000056515 2.514964 0.330186 21 -1 0.0 TRBV6-3 tcr_genes gex
7.005770e+00 5.157767e-14 7.033737 0 0 ENSMMUG00000051385 2.565485 0.087678 6 -1 0.0 TRBV7-4 tcr_genes gex
5.059898e-09 1.102736e-12 5.094420 0 6 ENSMMUG00000056515 2.718871 0.346819 17 -1 0.0 TRBV6-2 tcr_genes gex
3.990887e-05 1.043460e-05 5.330403 4 7 ENSMMUG00000056515 3.024953 0.396732 9 -1 0.0 TRBV10-2 tcr_genes gex
9.417316e+00 7.511109e-04 4.196857 0 1 MROH1 0.755971 0.059776 3 -1 0.0 TRAJ41 tcr_genes gex
4.759225e+00 1.800065e-01 2.534413 1 3 UNC119 0.938094 0.237778 8 -1 0.0 TRAV16 tcr_genes gex
3.526413e-17 1.193222e+00 3.394084 0 2 ADNP 0.667457 0.086450 3 -1 0.0 TRAJ8 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Cobain_PMBC_Final_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


nbr_frac graph_type ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index feature_type
0.0 gex_cluster 0.025397 4.382192 0.020492 3.0 9.0 54.0 0.284322 -0.019200 cd8 0.0 -1.0 tcr
0.1 gex_nbr 0.152882 5.357859 0.070400 3.0 4.0 34.0 0.416194 -0.013910 cd8 0.0 73.0 tcr
0.0 gex_cluster 0.154045 -3.746651 1.960642 0.0 1.0 95.0 -0.108843 0.085011 cd8 0.0 -1.0 tcr

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: Cobain_PMBC_Final_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Graph-versus-feature analysis was used to identify a set of TCR features that showed biased distributions in GEX neighborhoods. This plot shows the distribution of the top-scoring TCR features on the GEX UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: Cobain_PMBC_Final_gex_graph_vs_tcr_features_panels.png

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=33 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: Cobain_PMBC_Final_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=33 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: Cobain_PMBC_Final_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: Cobain_PMBC_Final_graph_vs_summary.png

gex_clusters_tcrdist_trees


These are TCRdist hierarchical clustering trees for the GEX clusters (cluster assignments stored in adata.obs['clusters_gex']). The trees are colored by CoNGA score with a color score range of 3.34e+00 (blue) to 3.34e-09 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: Cobain_PMBC_Final_gex_clusters_tcrdist_trees.png

conga_threshold_tcrdist_tree


This is a TCRdist hierarchical clustering tree for the clonotypes with CoNGA score less than 10.0. The tree is colored by CoNGA score with a color score range of 1.00e+01 (blue) to 1.00e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: Cobain_PMBC_Final_conga_threshold_tcrdist_tree.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
27.586393 9.167974e-164 ENSMMUG00000043894 gex 0.10
23.681303 3.179214e-120 ENSMMUG00000056515 gex 0.10
15.079441 1.251061e-47 ENSMMUG00000056431 gex 0.10
14.223775 3.690791e-42 ENSMMUG00000056431 gex 0.01
13.224135 3.601881e-36 ENSMMUG00000043894 gex 0.01
12.997962 7.108452e-35 ENSMMUG00000063185 gex 0.10
11.953443 4.484875e-31 mait tcr 0.01
11.800132 2.204734e-28 ENSMMUG00000052673 gex 0.01
11.201615 2.265080e-25 ENSMMUG00000061081 gex 0.10
10.450515 8.244776e-22 ENSMMUG00000060662 gex 0.10
9.959545 1.296188e-19 ENSMMUG00000056515 gex 0.01
9.508307 1.096371e-17 ETV7 gex 0.01
9.247648 1.298316e-16 ENSMMUG00000052673 gex 0.10
8.453784 1.595474e-13 ENSMMUG00000062085 gex 0.10
8.448200 1.673649e-13 ENSMMUG00000065017 gex 0.10
8.046012 4.839191e-12 ENSMMUG00000057062 gex 0.01
8.014384 6.262314e-12 ENSMMUG00000061119 gex 0.10
7.351325 1.110297e-09 ENSMMUG00000054409 gex 0.10
6.583303 3.312783e-09 TRAV1-2 tcr 0.01
7.184499 3.816208e-09 ENSMMUG00000056910 gex 0.10
7.177769 4.008814e-09 ENSMMUG00000061081 gex 0.01
7.024382 1.216448e-08 ENSMMUG00000057062 gex 0.10
6.970058 1.792350e-08 ENSMMUG00000062211 gex 0.10
6.217419 2.859286e-06 ENSMMUG00000062211 gex 0.01
6.176465 3.708643e-06 ANK3 gex 0.01
5.915825 1.868192e-05 ENSMMUG00000054409 gex 0.01
5.778524 4.263431e-05 DES gex 0.01
5.493606 2.227971e-04 ENSMMUG00000065017 gex 0.01
5.490045 2.273355e-04 ENSMMUG00000059325 gex 0.01
5.441001 2.997461e-04 CCDC126 gex 0.01
5.427365 3.235651e-04 ENSMMUG00000054661 gex 0.01
4.550036 3.861840e-04 mait tcr 0.10
5.381459 4.180026e-04 DNASE1L2 gex 0.01
5.343634 5.154062e-04 ENSMMUG00000062897 gex 0.10
4.474798 5.506820e-04 cd8 tcr 0.01
5.310930 6.170489e-04 ZNF516 gex 0.01
5.306390 6.326073e-04 COX1 gex 0.10
5.244228 8.878749e-04 ENSMMUG00000062085 gex 0.01
5.234169 9.376103e-04 KBTBD7 gex 0.01
5.224536 9.877549e-04 ENSMMUG00000056910 gex 0.01
5.211452 1.060038e-03 MAP3K8 gex 0.01
5.132538 1.617258e-03 AGMAT gex 0.01
5.070399 2.245934e-03 ENSMMUG00000061119 gex 0.01
4.955759 4.076151e-03 NFKBID gex 0.01
4.793489 9.272968e-03 CCT6B gex 0.01
4.745807 1.174932e-02 ENSMMUG00000061234 gex 0.01
3.762298 1.212189e-02 TRBV14 tcr 0.01
4.701150 1.463595e-02 ENSMMUG00000060662 gex 0.01
4.639234 1.978445e-02 ERH gex 0.10
4.623781 2.131790e-02 ZSWIM3 gex 0.01
Omitted 5 lines

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: Cobain_PMBC_Final_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=33 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: Cobain_PMBC_Final_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: Cobain_PMBC_Final_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=33 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: Cobain_PMBC_Final_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png